Maximizing Big Data's Value by Asking the Right Questions
How BI is built and managed just isn't consistent with the way it's used. Are we about to reprise the exact same mistakes in the world of big data (mis)management?
- By Stephen Swoyer
- January 20, 2015
Peter Evans, a senior integrated solutions consultant with Dell Inc.'s Information Management group, has a highly sensitive BS detector. It's a faculty that Evans says he honed during his decades of service in the UK's Royal Navy, where he worked on nuclear submarines.
Evans says he responds with a simple two-part question whenever anyone -- a client, a colleague, a random attendee at an industry event -- asks him about big data, which happens a lot.
"Whenever I get to the point where somebody's asking about a big data system, I ask them: 'What are the questions you want to use this big data system to ask? How are they different from the questions you're asking now?'" he explains. "If you don't understand the questions you want to ask, how can you make a decision about whether you need big data? Because your gut tells you so? Because you read somewhere -- in a magazine on a plane -- that you need big data?"
There's potential value in big data, Evans argues, and then some. For many organizations, however, the small stuff still isn't getting done right. The basic questions aren't being asked, let alone answered. Part of the responsibility for this can be laid at IT's doorstep, he says: access is still too locked down and business intelligence (BI) tools are just starting to become usable -- at least in the sense that business people enjoy using them. Fundamentally, Evans maintains, the way BI is built and managed isn't consistent with the way it's used. "The problem as I see it is that sales or finance or marketing want to do ... stuff with data that they can't do with the data warehouse," he indicates.
"They're more interested in large-scale data sets, they're more interested in [incorporating data from] social media. For an airline that wants to analyze ticket sales, they're not interested in 100 percent accuracy of that data; what they're interested in is the trend. If you're doing a trend analysis [on ticket prices], you need a certain level of accuracy, you don't need absolute accuracy."
In most cases, Evans argues, an airline sales analyst wants more data. Not just raw data, but different types of data. "For [an analysis of airline ticket sales], you're also interested in geolocation data, which you didn't even used to have in the warehouse," he comments. "The cleanliness or consistency of the data isn't as much of an issue here. It's throwaway data, in a sense: if the data is going to be kept and re-anlayzed and ... historical analysis is going to be done on it, then the data has to be cleaner because you're going to go back to it time and time again."
The problem has also to do with the way we build and package BI for business people, he continues: we expect that once we build a view or a report, that's it, it's done. It doesn't need to be changed and it can be filed away and forgotten about. However, the very function of BI is to permit the business to make intelligent decisions about itself: to give the business the information it needs to change and optimize its processes. When you do that, what you're reporting on or what you're analyzing must and will change, too.
"This is one way in which big data has been very valuable. It's forcing us to recognize this. Not only are we collecting data that we wouldn't even think about persisting [in the data warehouse], but we're consciously using it to change how we run the business."
He cites the example of a Dell customer, Kentucky State University (KSU).
"We built a system for them with that shows how their students are doing performance-wise, and ... it's designed to show them when their students are going to walk away from the grid based on their standings [in their classes]," he explains. This is an example of a traditional application of something like BI, Evans concedes, but data form social media permits KSU to do much more. "What they're thinking about is tying social media into that, too, to find the link and the cause and effect between social factors and the actual result case of a student not doing well in college."
As a result of these insights, KSU's internal processes will change not only its policies with regard to student performance, but, especially, the interventions that it prescribes based on how it projects that a student will perform. Data from social media will give KSU much more insight into why its students are failing. It will also introduce new complexities, particularly with respect to privacy and governance. KSU can use social data to tailor its policies and interventions more appropriately.
This, in turn, will drive changes in its traditional reporting and analytic infrastructure, but these changes must be done responsibly and ethically, Evans points out. "We can't ignore that data is only going to keep on growing and that we're going to want to do more and different things with it. More importantly, we can't ignore the fact that with our use of this [data] comes new privacy and security concerns, which, based on your industry, will drive what you do with the data and how you govern it."
Governance is perhaps the least sexy topic on the planet, Evans concedes. This has to do with the fact that IT traditionally practiced a kind of overly onerous governance, particularly in the case of the data warehouse and its clean and consistent data. This was a case of IT exercising too much control, he argues; more recently, and with big data, especially, the pendulum has swung the other way -- toward a model in which IT has too little control over data.
In some cases, Evans argues, you're seeing an utter disregard for control: "Somebody's got to recognize and start talking about the fact that there are security problems, there are data quality problems, there are master data management problems all around how we're building these hybrid data ecosystems."
"People … are so focused on delivering a big data project that nobody's quite gotten around to the problem of how do we actually implement a metadata standard or a metadata 'universal translator' for this kind of thing," Evans concludes, invoking the Star Trek concept of a "universal translator" that can translate between and among (idiomatic) languages. "For a hybrid ecosystem with a data warehouse, NoSQL, a graph database, and Hadoop to succeed, the primary driver is going to be a metadata translation layer. You can't even begin to look at metadata governance in a proper way if you don't understand how the data is linked between individual systems."